Use `NonZeroU64` to optimize `encoded_len_varint` #1192

mzabaluev · 2024-11-23T20:50:39Z

Give the compiler all the leverage to optimize encoded_len_varint:

Construct a NonZeroU64 to count leading zeros, as that can be faster on many platforms;
Use ilog2 instead of a handwritten expression to compute the base 2 algorithm, as the core library developers and the compiler would probably be in the best position to fine-tune it for all supported platforms.

With the varint benchmarks, I see slight improvements (1-3%) or no reproducible performance changes on 11th gen Intel Core and Mac M3.

The leading zeros count may perform better on many architectures when the zero case is excluded. Also use ilog2 as shorthand for the leading zeros trick because it makes more clearly what we mean to get, and should be ideally optimized by the compiler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `NonZeroU64` to optimize `encoded_len_varint` #1192

Use `NonZeroU64` to optimize `encoded_len_varint` #1192

mzabaluev commented Nov 23, 2024

Use NonZeroU64 to optimize encoded_len_varint #1192

Are you sure you want to change the base?

Use NonZeroU64 to optimize encoded_len_varint #1192

Conversation

mzabaluev commented Nov 23, 2024

Use `NonZeroU64` to optimize `encoded_len_varint` #1192

Use `NonZeroU64` to optimize `encoded_len_varint` #1192